Fast Eective Rule Induction

نویسندگان

  • William W Cohen
  • AT T Bell
  • Cameron Jones
چکیده

Many existing rule learning systems are computationally expensive on large noisy datasets In this paper we evaluate the recently proposed rule learning algorithm IREP on a large and diverse collection of benchmark problems We show that while IREP is extremely e cient it frequently gives error rates higher than those of C and C rules We then propose a num ber of modi cations resulting in an algo rithm RIPPERk that is very competitive with C rules with respect to error rates but much more e cient on large samples RIPPERk obtains error rates lower than or equivalent to C rules on of bench mark problems scales nearly linearly with the number of training examples and can e ciently process noisy datasets containing hundreds of thousands of examples

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sux Array 9=@.%"%k%4%j%:%'$nhf3s Sux Array $,$"$k!#$3$l$oj8;zns$na4$f$n@\hx<-$n%]%$%s%?$r<-=q=g$k3jg<$7$?g[ns$g!" Comparison among Sux Array Construction Algorithms

Sux array is a compact data structure for searching matched strings from text databases. It is an array of pointers and stores all suxes of a text in lexicographic order. Because its memory requirement is less than tree structures, it is eective for large databases. Moreover, constructing the sux array is used in the Block Sorting compression scheme. We compare algorithms for constructing sux a...

متن کامل

Algorithms for Segmenting Time Series

As with most computer science problems, representation of the data is the key to ecient and eective solutions. Piecewise linear representation has been used for the representation of the data. This representation has been used by various researchers to support clustering, classication, indexing and association rule mining of time series data. A variety of algorithms have been proposed to obtain...

متن کامل

A Margin-based Model with a Fast Local Searchnewline for Rule Weighting and Reduction in Fuzzynewline Rule-based Classification Systems

Fuzzy Rule-Based Classification Systems (FRBCS) are highly investigated by researchers due to their noise-stability and  interpretability. Unfortunately, generating a rule-base which is sufficiently both accurate and interpretable, is a hard process. Rule weighting is one of the approaches to improve the accuracy of a pre-generated rule-base without modifying the original rules. Most of the pro...

متن کامل

A decision-tree-based symbolic rule induction system for text categorization

We present a decision-tree-based symbolic rule induction system for categorizing text documents automatically. Our method for rule induction involves the novel combination of (1) a fast decision tree induction algorithm especially suited to text data and (2) a new method for converting a decision tree to a rule set that is simplified, but still logically equivalent to, the original tree. We rep...

متن کامل

Handling Time Changing Data with Adaptive Very Fast Decision Rules

Data streams are usually characterized by changes in the underlying distribution generating data. Therefore algorithms designed to work with data streams should be able to detect changes and quickly adapt the decision model. Rules are one of the most interpretable and flexible models for data mining prediction tasks. In this paper we present the Adaptive Very Fast Decision Rules (AVFDR), an on-...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1995